SI649 F24 Altair Homework #4¶

Overview¶

We'll focus on maps and cartrographic visualization. In this lab, you will practice:

  • Point Maps
  • Symbol Maps
  • Choropleth maps
  • Interactions with maps

Lab Instructions¶

  • Save, rename, and submit the ipynb file (use your username in the name).
  • Complete all the checkpoints, to create the required visualization at each cell.
  • Run every cell (do Runtime -> Restart and run all to make sure you have a clean working version), print to pdf, submit the pdf file.
  • If you end up stuck, show us your work by including links (URLs) that you have searched for. You'll get partial credit for showing your work in progress.
In [79]:
import pandas as pd
import altair as alt
from vega_datasets import data

alt.data_transformers.disable_max_rows()

df = pd.read_csv('https://raw.githubusercontent.com/dallascard/si649_public/main/altair_hw4/airports.csv')
url = "https://raw.githubusercontent.com/dallascard/si649_public/main/altair_hw4/small-airports.json"

Visualization 1: Dot Density Map¶

vis1 Description of the visualization:

We want to visualize the density of small airports in the world. Each small airport is represented by a dot. The visualization has two layers:

  • The base layer shows the outline of the world map.
  • The point map shows different small airports.
  • The tooltip shows the name of the airport.

Hint:

  • How can we show continents on the map? Which object can be used from the json dataset ?
  • How can we show only small airports on the map?
In [80]:
small_airports = df[df['type'] == 'small_airport']

# Load the world map data
world_map = alt.topo_feature(data.world_110m.url, 'countries')

# A base layer with the world map
base = alt.Chart(world_map).mark_geoshape(
    fill='lightgrey',
    stroke='white'
).transform_filter(
    alt.datum.id != 10 # Ignore the South Antarctica
).properties(
    width=800,
)

# Add the points for small airports
points = alt.Chart(small_airports).mark_circle(size=10, color='red').encode(
    longitude='longitude_deg:Q',
    latitude='latitude_deg:Q',
    tooltip=['name:N', 'municipality:N']
)

final_map = base + points
final_map
Out[80]:

Visualization 2: Propotional Symbol¶

vis2 Description of the visualization:

The visualization shows faceted maps pointing the 20 most populous cities in the world by 2100. There are two layers in faceted charts:

  • The base layer shows the map of countries.
  • The second layer shows size encoded points indicating the population of those countries.
  • Tooltip shows city name and population.

Hint:

  • Which projection has been used in individual charts?
  • How to create a faceted chart with different years and 2 columns?
In [81]:
countries_url = data.world_110m.url
source = 'https://raw.githubusercontent.com/dallascard/SI649_public/main/altair_hw4/population_prediction.csv'
In [82]:
data1 = pd.read_csv(source)

sorted_cities = data1.sort_values(['year', 'population'], ascending=[True, False])
top_cities_each_year = sorted_cities.groupby('year').head(20).reset_index(drop=True)

countries_url = alt.topo_feature(
    'https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/world-110m.json', 'countries'
)

def create_map(year):
    year_data = top_cities_each_year[top_cities_each_year['year'] == year]
    base_map = alt.Chart(countries_url).mark_geoshape(
        fill='lightgray',
        stroke='white'
    ).project(
        type='equirectangular'
    ).properties(
        width=300,
        height=200
    )
    
    points = alt.Chart(year_data).mark_circle(color='green').encode(
        longitude='lon:Q',
        latitude='lat:Q',
        size=alt.Size('population:Q', scale=alt.Scale(range=[10, 300]), title='Population (million)'),
        tooltip=['city:N', 'population:Q', 'year:O']
    )
    
    annotation = alt.Chart(pd.DataFrame({'year': [year]})).mark_text(
        align='center',
        fontSize=10,
        dy=-170
    ).encode(
        text='year:O'
    ).properties(
        width=300,
        height=200
    )
    
    return base_map + points + annotation

years = top_cities_each_year['year'].unique()
map_layers = [create_map(year) for year in years]

final = alt.vconcat(
    alt.hconcat(map_layers[0], map_layers[1]),
    alt.hconcat(map_layers[2], map_layers[3]),
    map_layers[4]
).properties(
    title='The 20 Most Populous Cities in the World by 2100'
)

final
Out[82]:

Visualization 3: Hurricane Trajectories¶

vis3 Description of the visualization:

Create a map that shows the paths (trajectories) of the 2017 hurricanes. Filter the data so that only 2017 hurricanes are shown. Remove Alaska and Hawaii from the map (Filter out ids 2 and 15).

Hint:

  • How will you filter out 2017 hurricanes?
  • Which object can be used to show state boundaries?
In [83]:
states_url = data.us_10m.url
hurricane_data = pd.read_csv('https://raw.githubusercontent.com/dallascard/SI649_public/main/altair_hw4/hurdat2.csv')
hurricane_data.sample(3)
Out[83]:
identifier name num_pts record_id status latitude longitude max_wind min_pressure datetime
20023 AL031943 UNNAMED 34 NaN TS 14.1 -56.2 55 -999 1943-08-19T12:00:00
48702 AL082014 GONZALO 39 L HU 18.1 -63.0 75 984 2014-10-13T22:45:00
46961 AL032011 CINDY 13 NaN TS 43.8 -41.2 50 997 2011-07-22T06:00:00
In [84]:
# Filter for 2017 hurricanes
hurricane_data['year'] = pd.to_datetime(hurricane_data['datetime']).dt.year
hurricane_2017 = hurricane_data[hurricane_data['year'] == 2017]

# Load state boundaries
states_url = alt.Data(url='https://vega.github.io/vega-datasets/data/us-10m.json', format=alt.DataFormat(type='topojson', feature='states'))

# Base map
base = alt.Chart(states_url).mark_geoshape(
    fill='lightgray',
    stroke='black'
).transform_filter(
    alt.datum.id != 2  # Exclude Alaska
).transform_filter(
    alt.datum.id != 15  # Exclude Hawaii
).project(
    type='mercator'
).properties(
    width=800,
    height=500
)

# Hurricane trajectories
trajectories = alt.Chart(hurricane_2017).mark_line(color='blue').encode(
    longitude='longitude:Q',
    latitude='latitude:Q',
)

map_with_trajectories = base + trajectories
map_with_trajectories
Out[84]:

Visualization 4: Choropleth Map¶

vis4

Interaction

vis4

Description of the visualization:

The visualization has a choropleth map showing the population of different states and a sorted bar chart showing the top 15 states by population. These charts are connected using a click interaction.

Hint

  • Which object can be used to show states on the map?
  • Which transform can be used to add population data to the geographic data? How can we combine two datasets in Altair?
In [85]:
state_map = data.us_10m.url
state_pop = data.population_engineers_hurricanes()[['state', 'id', 'population']]
state_pop.sample(5)
Out[85]:
state id population
0 Alabama 1 4863300
41 South Dakota 46 865454
24 Mississippi 28 2988726
23 Minnesota 27 5519952
27 Nebraska 31 1907116
In [86]:
state_map = alt.topo_feature(data.us_10m.url, 'states')
state_pop = data.population_engineers_hurricanes()[['state', 'id', 'population']]
selection = alt.selection_point(fields=['id']) 

# Choropleth map
map_chart = alt.Chart(state_map).mark_geoshape().encode(
    color=alt.condition(selection, alt.Color('population:Q', scale=alt.Scale(scheme='yellowgreenblue')), alt.value('lightgray')),
    tooltip=['state:N', 'population:Q']
).transform_lookup(
    lookup='id',
    from_=alt.LookupData(state_pop, 'id', ['state', 'population'])
).project(
    type='albersUsa'
).add_params(
    selection
).properties(
    width=400,
    height=300,
    title='Choropleth Map of States by Population'
)

# Bar chart
bar_chart = alt.Chart(state_pop).mark_bar().encode(
    y=alt.Y('state:N', sort='x', title='State'),
    x=alt.X('population:Q', title='Population'),
    color=alt.condition(selection, alt.Color('population:Q', scale=alt.Scale(scheme='yellowgreenblue')), alt.value('lightgray')),
    tooltip=['state:N', 'population:Q']
).add_params(
    selection
).transform_window(
    rank='rank(population)',
    sort=[alt.SortField('population', order='descending')]
).transform_filter(
    'datum.rank <= 15'
).properties(
    width=400,
    height=300,
    title='Top 15 States by Population'
)

# Combine the two charts
final_chart = map_chart | bar_chart
final_chart
Out[86]: